Goto

Collaborating Authors

 quadratic model


Dynamics of Stochastic Momentum Methods on Large-scale, Quadratic Models

Neural Information Processing Systems

We analyze a class of stochastic gradient algorithms with momentum on a high-dimensional random least squares problem. Our framework, inspired by random matrix theory, provides an exact (deterministic) characterization for the sequence of function values produced by these algorithms which is expressed only in terms of the eigenvalues of the Hessian. This leads to simple expressions for nearly-optimal hyperparameters, a description of the limiting neighborhood, and average-case complexity. As a consequence, we show that (small-batch) stochastic heavy-ball momentum with a fixed momentum parameter provides no actual performance improvement over SGD when step sizes are adjusted correctly. For contrast, in the non-strongly convex setting, it is possible to get a large improvement over SGD using momentum. By introducing hyperparameters that depend on the number of samples, we propose a new algorithm sDANA (stochastic dimension adjusted Nesterov acceleration) which obtains an asymptotically optimal average-case complexity while remaining linearly convergent in the strongly convex setting without adjusting parameters.



Low-dimensional models of neural population activity in sensory cortical circuits

Evan W. Archer, Urs Koster, Jonathan W. Pillow, Jakob H. Macke

Neural Information Processing Systems

Neural responses in visual cortex are influenced by visual stimuli and by ongoing spiking activity in local circuits. An important challenge in computational neuroscience is to develop models that can account for both of these features in large multi-neuron recordings and to reveal how stimulus representations interact with and depend on cortical dynamics. Here we introduce a statistical model of neural population activity that integrates a nonlinear receptive field model with a latent dynamical model of ongoing cortical activity. This model captures temporal dynamics and correlations due to shared stimulus drive as well as common noise. Moreover, because the nonlinear stimulus inputs are mixed by the ongoing dynamics, the model can account for a multiple idiosyncratic receptive field shapes with a small number of nonlinear inputs to a low-dimensional dynamical model. We introduce a fast estimation method using online expectation maximization with Laplace approximations, for which inference scales linearly in both population size and recording duration. We test this model to multi-channel recordings from primary visual cortex and show that it accounts for neural tuning properties as well as cross-neural correlations.



Enhancing Trust-Region Bayesian Optimization via Newton Methods

Chen, Quanlin, Chen, Yiyu, Huo, Jing, Ding, Tianyu, Gao, Yang, Chen, Yuetong

arXiv.org Machine Learning

Bayesian Optimization (BO) has been widely applied to optimize expensive black-box functions while retaining sample efficiency. However, scaling BO to high-dimensional spaces remains challenging. Existing literature proposes performing standard BO in multiple local trust regions (TuRBO) for heterogeneous modeling of the objective function and avoiding over-exploration. Despite its advantages, using local Gaussian Processes (GPs) reduces sampling efficiency compared to a global GP . To enhance sampling efficiency while preserving heterogeneous modeling, we propose to construct multiple local quadratic models using gradients and Hessians from a global GP, and select new sample points by solving the bound-constrained quadratic program. Additionally, we address the issue of vanishing gradients of GPs in high-dimensional spaces. We provide a convergence analysis and demonstrate through experimental results that our method enhances the efficacy of TuRBO and outperforms a wide range of high-dimensional BO techniques on synthetic functions and real-world applications.


Scaling Law for Stochastic Gradient Descent in Quadratically Parameterized Linear Regression

Ding, Shihong, Zhang, Haihan, Zhao, Hanzhen, Fang, Cong

arXiv.org Artificial Intelligence

In machine learning, the scaling law describes how the model performance improves with the model and data size scaling up. From a learning theory perspective, this class of results establishes upper and lower generalization bounds for a specific learning algorithm. Here, the exact algorithm running using a specific model parameterization often offers a crucial implicit regularization effect, leading to good generalization. To characterize the scaling law, previous theoretical studies mainly focus on linear models, whereas, feature learning, a notable process that contributes to the remarkable empirical success of neural networks, is regretfully vacant. This paper studies the scaling law over a linear regression with the model being quadratically parameterized. We consider infinitely dimensional data and slope ground truth, both signals exhibiting certain power-law decay rates. We study convergence rates for Stochastic Gradient Descent and demonstrate the learning rates for variables will automatically adapt to the ground truth. As a result, in the canonical linear regression, we provide explicit separations for generalization curves between SGD with and without feature learning, and the information-theoretical lower bound that is agnostic to parametrization method and the algorithm. Our analysis for decaying ground truth provides a new characterization for the learning dynamic of the model.


Low-dimensional models of neural population activity in sensory cortical circuits

Evan W. Archer, Urs Koster, Jonathan W. Pillow, Jakob H. Macke

Neural Information Processing Systems

Neural responses in visual cortex are influenced by visual stimuli and by ongoing spiking activity in local circuits. An important challenge in computational neuroscience is to develop models that can account for both of these features in large multi-neuron recordings and to reveal how stimulus representations interact with and depend on cortical dynamics. Here we introduce a statistical model of neural population activity that integrates a nonlinear receptive field model with a latent dynamical model of ongoing cortical activity. This model captures temporal dynamics and correlations due to shared stimulus drive as well as common noise. Moreover, because the nonlinear stimulus inputs are mixed by the ongoing dynamics, the model can account for a multiple idiosyncratic receptive field shapes with a small number of nonlinear inputs to a low-dimensional dynamical model. We introduce a fast estimation method using online expectation maximization with Laplace approximations, for which inference scales linearly in both population size and recording duration. We test this model to multi-channel recordings from primary visual cortex and show that it accounts for neural tuning properties as well as cross-neural correlations.


Export Reviews, Discussions, Author Feedback and Meta-Reviews

Neural Information Processing Systems

Summary: The paper introduces an algorithm (MORE) for black box optimization that constructs local quadratic surrogate'' models based on recent observations. By using a (simpler than the true function) quadratic model, the iterative step of refining the reward parameters can be computed in closed form (though the models themselves seem to be built with a form of sampling). This approach allows for a more sample efficient and robust search procedure, which is shown to outperform state-of-the-art methods in terms of samples and converged parameters on a number of simple functions as well as some very complex robotics tasks. Review: The new algorithm performs much better than the state-of-the-art algorithms in a wide range of experiments and is applicable in a very important problem setting. I appreciate the wide range of problems that the authors used and I think they make a very strong case for the new algorithm.


Data-driven system identification using quadratic embeddings of nonlinear dynamics

Klus, Stefan, N'Konzi, Joel-Pascal

arXiv.org Machine Learning

We propose a novel data-driven method called QENDy (Quadratic Embedding of Nonlinear Dynamics) that not only allows us to learn quadratic representations of highly nonlinear dynamical systems, but also to identify the governing equations. The approach is based on an embedding of the system into a higher-dimensional feature space in which the dynamics become quadratic. Just like SINDy (Sparse Identification of Nonlinear Dynamics), our method requires trajectory data, time derivatives for the training data points, which can also be estimated using finite difference approximations, and a set of preselected basis functions, called dictionary. We illustrate the efficacy and accuracy of QENDy with the aid of various benchmark problems and compare its performance with SINDy and a deep learning method for identifying quadratic embeddings. Furthermore, we analyze the convergence of QENDy and SINDy in the infinite data limit, highlight their similarities and main differences, and compare the quadratic embedding with linearization techniques based on the Koopman operator.


Dynamics of Stochastic Momentum Methods on Large-scale, Quadratic Models

Neural Information Processing Systems

We analyze a class of stochastic gradient algorithms with momentum on a high-dimensional random least squares problem. Our framework, inspired by random matrix theory, provides an exact (deterministic) characterization for the sequence of function values produced by these algorithms which is expressed only in terms of the eigenvalues of the Hessian. This leads to simple expressions for nearly-optimal hyperparameters, a description of the limiting neighborhood, and average-case complexity. As a consequence, we show that (small-batch) stochastic heavy-ball momentum with a fixed momentum parameter provides no actual performance improvement over SGD when step sizes are adjusted correctly. For contrast, in the non-strongly convex setting, it is possible to get a large improvement over SGD using momentum.